50 research outputs found
Direct Image to Point Cloud Descriptors Matching for 6-DOF Camera Localization in Dense 3D Point Cloud
We propose a novel concept to directly match feature descriptors extracted
from RGB images, with feature descriptors extracted from 3D point clouds. We
use this concept to localize the position and orientation (pose) of the camera
of a query image in dense point clouds. We generate a dataset of matching 2D
and 3D descriptors, and use it to train a proposed Descriptor-Matcher
algorithm. To localize a query image in a point cloud, we extract 2D keypoints
and descriptors from the query image. Then the Descriptor-Matcher is used to
find the corresponding pairs 2D and 3D keypoints by matching the 2D descriptors
with the pre-extracted 3D descriptors of the point cloud. This information is
used in a robust pose estimation algorithm to localize the query image in the
3D point cloud. Experiments demonstrate that directly matching 2D and 3D
descriptors is not only a viable idea but also achieves competitive accuracy
compared to other state-of-the-art approaches for camera pose localization
Learning and Matching Multi-View Descriptors for Registration of Point Clouds
Critical to the registration of point clouds is the establishment of a set of
accurate correspondences between points in 3D space. The correspondence problem
is generally addressed by the design of discriminative 3D local descriptors on
the one hand, and the development of robust matching strategies on the other
hand. In this work, we first propose a multi-view local descriptor, which is
learned from the images of multiple views, for the description of 3D keypoints.
Then, we develop a robust matching approach, aiming at rejecting outlier
matches based on the efficient inference via belief propagation on the defined
graphical model. We have demonstrated the boost of our approaches to
registration on the public scanning and multi-view stereo datasets. The
superior performance has been verified by the intensive comparisons against a
variety of descriptors and matching methods
Polarimetric Multi-View Inverse Rendering
A polarization camera has great potential for 3D reconstruction since the
angle of polarization (AoP) of reflected light is related to an object's
surface normal. In this paper, we propose a novel 3D reconstruction method
called Polarimetric Multi-View Inverse Rendering (Polarimetric MVIR) that
effectively exploits geometric, photometric, and polarimetric cues extracted
from input multi-view color polarization images. We first estimate camera poses
and an initial 3D model by geometric reconstruction with a standard
structure-from-motion and multi-view stereo pipeline. We then refine the
initial model by optimizing photometric and polarimetric rendering errors using
multi-view RGB and AoP images, where we propose a novel polarimetric rendering
cost function that enables us to effectively constrain each estimated surface
vertex's normal while considering four possible ambiguous azimuth angles
revealed from the AoP measurement. Experimental results using both synthetic
and real data demonstrate that our Polarimetric MVIR can reconstruct a detailed
3D shape without assuming a specific polarized reflection depending on the
material.Comment: Paper accepted in ECCV 202
Single-Image Depth Prediction Makes Feature Matching Easier
Good local features improve the robustness of many 3D re-localization and
multi-view reconstruction pipelines. The problem is that viewing angle and
distance severely impact the recognizability of a local feature. Attempts to
improve appearance invariance by choosing better local feature points or by
leveraging outside information, have come with pre-requisites that made some of
them impractical. In this paper, we propose a surprisingly effective
enhancement to local feature extraction, which improves matching. We show that
CNN-based depths inferred from single RGB images are quite helpful, despite
their flaws. They allow us to pre-warp images and rectify perspective
distortions, to significantly enhance SIFT and BRISK features, enabling more
good matches, even when cameras are looking at the same scene but in opposite
directions.Comment: 14 pages, 7 figures, accepted for publication at the European
conference on computer vision (ECCV) 202
Capture, Reconstruction, and Representation of the Visual Real World for Virtual Reality
We provide an overview of the concerns, current practice, and limitations for capturing, reconstructing, and representing the real world visually within virtual reality. Given that our goals are to capture, transmit, and depict complex real-world phenomena to humans, these challenges cover the opto-electro-mechanical, computational, informational, and perceptual fields. Practically producing a system for real-world VR capture requires navigating a complex design space and pushing the state of the art in each of these areas. As such, we outline several promising directions for future work to improve the quality and flexibility of real-world VR capture systems
The Glasgow Outcome Scale -- 40 years of application and refinement
The Glasgow Outcome Scale (GOS) was first published in 1975 by Bryan Jennett and Michael Bond. With over 4,000 citations to the original paper, it is the most highly cited outcome measure in studies of brain injury and the second most-cited paper in clinical neurosurgery. The original GOS and the subsequently developed extended GOS (GOSE) are recommended by several national bodies as the outcome measure for major trauma and for head injury. The enduring appeal of the GOS is linked to its simplicity, short administration time, reliability and validity, stability, flexibility of administration (face-to-face, over the telephone and by post), cost-free availability and ease of access. These benefits apply to other derivatives of the scale, including the Glasgow Outcome at Discharge Scale (GODS) and the GOS paediatric revision. The GOS was devised to provide an overview of outcome and to focus on social recovery. Since the initial development of the GOS, there has been an increasing focus on the multidimensional nature of outcome after head injury. This Review charts the development of the GOS, its refinement and usage over the past 40 years, and considers its current and future roles in developing an understanding of brain injury
Learning to solve nonlinear least squares for monocular stereo
Sum-of-squares objective functions are very popular in computer vision algorithms. However, these objective functions are not always easy to optimize. The underlying assumptions made by solvers are often not satisfied and many problems are inherently ill-posed. In this paper, we propose a neural nonlinear least squares optimization algorithm which learns to effectively optimize these cost functions even in the presence of adversities. Unlike traditional approaches, the proposed solver requires no hand-crafted regularizers or priors as these are implicitly learned from the data. We apply our method to the problem of motion stereo ie. jointly estimating the motion and scene geometry from pairs of images of a monocular sequence. We show that our learned optimizer is able to efficiently and effectively solve this challenging optimization problem
Handcrafted Outlier Detection Revisited
Local feature matching is a critical part of many computer vision pipelines, including among others Structure-from-Motion, SLAM, and Visual Localization. However, due to limitations in the descriptors, raw matches are often contaminated by a majority of outliers. As a result, outlier detection is a fundamental problem in computer vision and a wide range of approaches, from simple checks based on descriptor similarity to geometric verification, have been proposed over the last decades. In recent years, deep learning-based approaches to outlier detection have become popular. Unfortunately, the corresponding works rarely compare with strong classical baselines. In this paper we revisit handcrafted approaches to outlier filtering. Based on best practices, we propose a hierarchical pipeline for effective outlier detection as well as integrate novel ideas which in sum lead to an efficient and competitive approach to outlier rejection. We show that our approach, although not relying on learning, is more than competitive to both recent learned works as well as handcrafted approaches, both in terms of efficiency and effectiveness. The code is available at https://github.com/cavalli1234/AdaLAM
Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions
International audienceIn this work we target the problem of estimating accurately localised correspondences between a pair of images. We adopt the recent Neighbourhood Consensus Networks that have demonstrated promising performance for difficult correspondence problems and propose modifications to overcome their main limitations: large memory consumption, large inference time and poorly localised correspondences. Our proposed modifications can reduce the memory footprint and execution time more than , with equivalent results. This is achieved by sparsifying the correlation tensor containing tentative matches, and its subsequent processing with a 4D CNN using submanifold sparse convolutions. Localisation accuracy is significantly improved by processing the input images in higher resolution, which is possible due to the reduced memory footprint, and by a novel two-stage correspondence relocalisation module. The proposed Sparse-NCNet method obtains state-of-the-art results on the HPatches Sequences and InLoc visual localisation benchmarks, and competitive results in the Aachen Day-Night benchmark